Skip to content

Errors (versions)#3187

Draft
matt-aitken wants to merge 7 commits intomainfrom
errors-add-version-filter
Draft

Errors (versions)#3187
matt-aitken wants to merge 7 commits intomainfrom
errors-add-version-filter

Conversation

@matt-aitken
Copy link
Member

  • Added versions filtering on the Errors list and page
  • Added errors stacked bars to the graph on the individual error page

@changeset-bot
Copy link

changeset-bot bot commented Mar 5, 2026

⚠️ No Changeset found

Latest commit: e609e8b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@matt-aitken matt-aitken marked this pull request as draft March 5, 2026 23:24
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Walkthrough

Adds version-aware error grouping and alerting across frontend, backend, and DB. New UI: LogsVersionFilter and exported VersionsDropdown, status filters, chart legend styling, and list/status badges. Backend presenters (ErrorGroupPresenter, ErrorsListPresenter) now accept and return versions and state; introduces ErrorGroupState, ErrorGroupActions service, deliver/evaluate alert services (DeliverErrorGroupAlertService, ErrorAlertEvaluator), alerts worker tasks, and email template for error alerts. ClickHouse query builders for per-version aggregations and "since" queries were added. Prisma schema and SQL migrations create ErrorGroupState and add project alert config fields.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete: missing issue reference, incomplete checklist, no testing details provided, and lacks structured sections as required by the template. Add issue reference (Closes #), complete the checklist, describe testing steps, and include changelog details in the provided template format.
Docstring Coverage ⚠️ Warning Docstring coverage is 19.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Errors (versions)' directly relates to the main objective of adding version filtering to the errors list and page, and is concise and clear.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch errors-add-version-filter
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Errors list sparklines don't reflect the version filter

When a user filters errors by version on the list page (errors._index/route.tsx), the occurrencesPromise at line 120-134 chains from the filtered listPromise but calls presenter.getOccurrences() without passing the versions filter. This means the sparkline activity graphs shown in each error row will display occurrences across ALL versions, even when the list itself is filtered to specific versions. This may be intentional (showing full activity context) but could confuse users who expect the sparklines to match the filtered view.

(Refers to lines 120-134)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +213 to +219
const data = buckets.map((epoch) => {
const point: Record<string, number | Date> = { date: new Date(epoch * 1000) };
for (const version of sortedVersions) {
point[version] = byBucketVersion.get(`${epoch}:${version}`) ?? 0;
}
return point;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Version string used as Record key can collide with reserved date key, breaking the chart

In getOccurrences, version strings are used as dynamic keys in the same Record<string, number | Date> that already has a date key (apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts:214-217). If a task_version value is literally "date" (or empty-string mapped to "unknown" which is safe, but "date" is not guarded), the version count (a number) overwrites the date field (a Date). Downstream in ActivityChart (apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx:428), d.date instanceof Date would be false and new Date(smallNumber).getTime() would produce a nonsensical timestamp (epoch 1970), completely breaking the chart's x-axis. The same collision can occur with "__timestamp" added at route.tsx:427.

Prompt for agents
In apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts, the getOccurrences method (lines 213-219) uses version strings as keys in a Record that also has a "date" key. To prevent collisions, either:

1. Prefix version keys with a safe namespace (e.g., "v:" prefix) and update the chart consumer to strip the prefix when displaying labels, OR
2. Restructure the data to use a separate nested object for version counts, e.g.:
   { date: Date, versions: Record<string, number> }
   and update the ChartRoot consumer in the fingerprint route (route.tsx lines 475-496) to match.

Also in apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx line 427, the __timestamp key added via spread could similarly collide if a version were named "__timestamp". Guard against both reserved keys ("date" and "__timestamp").
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@matt-aitken matt-aitken force-pushed the errors-add-version-filter branch from cd9b5ea to e609e8b Compare March 16, 2026 23:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx (1)

560-574: ⚠️ Potential issue | 🔴 Critical

Missing ERROR_GROUP case in alertTypeTitle will cause runtime error.

The ProjectAlertType enum now includes ERROR_GROUP (added in this PR's schema changes), but alertTypeTitle doesn't handle it. When an alert channel with ERROR_GROUP type is displayed in the table (line 237), this will throw an error.

🐛 Proposed fix
 export function alertTypeTitle(alertType: ProjectAlertType): string {
   switch (alertType) {
     case "TASK_RUN":
       return "Task run failure";
     case "TASK_RUN_ATTEMPT":
       return "Task attempt failure";
     case "DEPLOYMENT_FAILURE":
       return "Deployment failure";
     case "DEPLOYMENT_SUCCESS":
       return "Deployment success";
+    case "ERROR_GROUP":
+      return "Error group";
     default: {
       throw new Error(`Unknown alertType: ${alertType}`);
     }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx
around lines 560 - 574, The switch in alertTypeTitle(ProjectAlertType) is
missing a handler for the new "ERROR_GROUP" enum value and will throw for that
case; update the alertTypeTitle function to add a case for "ERROR_GROUP" (e.g.,
case "ERROR_GROUP": return "Error group";) so all ProjectAlertType variants are
handled and the default throw is avoided when ERROR_GROUP alerts are rendered
(used elsewhere where alerts are displayed).
apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts (1)

64-85: ⚠️ Potential issue | 🔴 Critical

getSummary() declares ErrorGroupSummary as return type but returns an incomplete object without state.

The getSummary() method at line 249 declares a return type of Promise<ErrorGroupSummary | undefined>, where ErrorGroupSummary.state is required. However, the method returns an object (lines 276–285) that omits the state field entirely. While the call() method does assign state after retrieval (line 149), this violates the type contract of getSummary(). Either make state optional in the type definition, include state in getSummary()'s return object, or update getSummary()'s return type to reflect the incomplete object it actually returns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts` around lines 64
- 85, getSummary() claims to return ErrorGroupSummary but omits the required
state field; update getSummary() to include a state property in its returned
object (either populated from the same data source used by call() or a sensible
default with fields like status, resolvedAt, resolvedInVersion, resolvedBy,
ignoredUntil, ignoredReason, ignoredByUserId, ignoredUntilOccurrenceRate,
ignoredUntilTotalOccurrences) so the returned object matches the
ErrorGroupSummary type, and remove any subsequent reassignment in call() that
assumes state is missing.
🧹 Nitpick comments (4)
internal-packages/clickhouse/src/index.ts (1)

248-263: Add @crumbs tracing annotations for the new error-query paths.

Please add crumb comments around the newly added errors query builder exposure to align with repository tracing standards.

As per coding guidelines, **/*.{ts,tsx,js}: “Add crumbs as you write code using // @crumbs comments or // #region @crumbs blocks for agentcrumbs debug tracing”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal-packages/clickhouse/src/index.ts` around lines 248 - 263, The new
errors getter (get errors) that exposes multiple query-builder helpers
(getGroups, getInstances, getHourlyOccurrences, affectedVersionsQueryBuilder,
listQueryBuilder, occurrencesListQueryBuilder, createOccurrencesQueryBuilder,
createOccurrencesByVersionQueryBuilder, occurrenceCountSinceQueryBuilder,
activeErrorsSinceQueryBuilder, occurrenceCountsSinceQueryBuilder) needs
repository-standard tracing crumbs; wrap the errors block with // `@crumbs`
comments (or a // `#region` `@crumbs` ... // `#endregion` `@crumbs` block) immediately
before and after the getter so agentcrumbs can trace calls to these functions,
ensuring the comments surround the get errors { ... } block and reference the
same function names for clarity.
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx (1)

630-632: Consider a more reliable approach for closing the dialog after submission.

Using setTimeout(onClose, 100) is fragile—if the submission takes longer than 100ms to start, the dialog may close before the request begins. Consider using the fetcher's state to close the dialog when submission completes:

♻️ Suggested alternative
// In CustomIgnoreForm, use useEffect to close on success:
import { useEffect } from "react";

// ...

useEffect(() => {
  if (fetcher.state === "idle" && fetcher.data?.ok) {
    onClose();
  }
}, [fetcher.state, fetcher.data, onClose]);

// Remove setTimeout from onSubmit

Or simply close immediately on submit and let the optimistic UI handle it if that's the intended behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx
around lines 630 - 632, The current onSubmit handler uses setTimeout(onClose,
100) which is fragile; remove the setTimeout call in the onSubmit prop of
CustomIgnoreForm and instead close the dialog when the fetcher completes by
adding a useEffect in the CustomIgnoreForm component that watches fetcher.state
and fetcher.data (e.g., useEffect(() => { if (fetcher.state === "idle" &&
fetcher.data?.ok) onClose(); }, [fetcher.state, fetcher.data, onClose]));
alternatively, if you intend optimistic UI, call onClose() immediately in
onSubmit and omit the timeout.
apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts (1)

30-46: Prefer a type alias for the payload shape.

This new model should use type instead of interface to stay aligned with the repo's TypeScript rules.

As per coding guidelines, **/*.{ts,tsx}: Use types over interfaces for TypeScript.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts` around
lines 30 - 46, Replace the interface declaration ErrorAlertPayload with a type
alias using the same shape (e.g., type ErrorAlertPayload = { ... }); keep the
property names and nested error object unchanged (including channelId,
projectId, classification, and the error fields like fingerprint, environmentId,
environmentName, taskIdentifier, errorType, errorMessage, sampleStackTrace,
firstSeen, lastSeen, occurrenceCount) so all references to ErrorAlertPayload and
ErrorAlertClassification continue to work; update any imports/exports if
necessary to reflect the type alias.
apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts (1)

16-26: Prefer type aliases for these internal models.

AlertableError and ResolvedEnvironment should use type aliases here to match the repo's TypeScript convention.

As per coding guidelines, **/*.{ts,tsx}: Use types over interfaces for TypeScript.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 16 - 26, Replace the two internal interfaces with type aliases to match
repo convention: convert interface AlertableError and interface
ResolvedEnvironment into type AlertableError = { ... } and type
ResolvedEnvironment = { ... }, keeping the exact same property names and types
(classification, error, environmentName) and (id, type, displayName)
respectively; update any local references if needed but do not change the shape
or visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts`:
- Around line 130-145: getState is being called without the task identifier so
it can return the wrong ErrorGroupState when the same fingerprint exists under
multiple tasks; stop fetching summary and state in parallel and instead first
await this.getSummary(...) to read the taskIdentifier, then call
this.getState(environmentId, taskIdentifier, fingerprint) (or otherwise pass the
task identifier into getState) when you run the remaining Promise.all for
getAffectedVersions, getRunList and getState so state lookup is disambiguated by
taskIdentifier.

In `@apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts`:
- Around line 104-112: The helper `#scheduleErrorAlertEvaluation` currently
enqueues a job using the same id pattern "evaluateErrorAlerts:${projectId}"
that's used by the evaluator's delayed self-chain, causing deduplication/no-op
if a future run is already queued; change the id for this immediate kick to be
unique (for example append a timestamp or a "now" suffix like
"evaluateErrorAlerts:${projectId}:now:${Date.now()}") so alertsWorker.enqueue
always creates a distinct job when triggered immediately after channel
creation/edit; keep the job name "v3.evaluateErrorAlerts" and payload identical,
only alter the id generation logic in `#scheduleErrorAlertEvaluation`.

In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts`:
- Around line 239-243: The code logs and throws the raw webhook URL
(webhookProperties.data.url) from the channel object (type
ProjectAlertChannelType), which can leak embedded credentials; update
deliverErrorGroupAlert.server.ts to sanitize/redact webhookProperties.data.url
before any logging or including it in thrown errors (e.g., replace
host/path/query credentials with "[REDACTED]" or strip auth/query params) and
ensure any thrown Error messages or processLogger entries use the redacted
value; apply the same change for the other occurrences around the webhook
handling block (also referenced at the second occurrence near lines 310-316) so
no raw webhook URL is emitted.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts`:
- Around line 178-197: The total-occurrence unignore check uses
context.occurrencesSince (window-only) so a threshold like
ignoredUntilTotalOccurrences won’t fire across multiple evaluation windows;
change the logic to compute the cumulative occurrences since the alert was
ignored and compare that to state.ignoredUntilTotalOccurrences. Specifically,
replace the context.occurrencesSince check with a call that gets counts since
state.ignoredAt (or otherwise compute occurrences since state.ignoredAt using
getOccurrenceCountsSince/your occurrence-count helper) and use that cumulative
count in the if that references state.ignoredUntilTotalOccurrences and
state.ignoredAt.
- Around line 102-125: The loop enqueues alerts to every envChannels entry on
every evaluator run, ignoring per-channel intervals; inside the for (const
channel of envChannels) loop (before calling alertsWorker.enqueue) add a guard
that respects the channel's configured interval by comparing it to the
evaluator's current run interval (e.g., currentRunIntervalMinutes or
evaluationIntervalMinutes): only enqueue when the current run aligns with the
channel interval (for example channel.intervalMinutes ===
currentRunIntervalMinutes or channel.intervalMinutes % currentRunIntervalMinutes
=== 0), otherwise continue; update any related loops mentioned (the other
occurrences around the same pattern at the referenced ranges) to use the same
interval-check guard.
- Around line 30-35: The constructor for ErrorAlertEvaluator has its _prisma and
_replica defaults reversed causing writes (e.g., updateErrorGroupStates) to go
to the read-only client; swap the defaults so _prisma defaults to prisma and
_replica defaults to $replica. Update the constructor parameter
order/assignments for ErrorAlertEvaluator accordingly so any write operations
use the primary Prisma client (_prisma = prisma) and reads use the replica
(_replica = $replica).
- Around line 372-387: Remove the stray comma-expression "await this, ..." so
the Prisma update is actually awaited: locate the call to
this._prisma.errorGroupState.update (the block that sets status: "UNRESOLVED"
and clears ignored/resolved fields) and change the statement to await the update
promise directly (i.e., await this._prisma.errorGroupState.update(...)) so the
database write completes before continuing (e.g., before calling selfChain()).

In `@internal-packages/emails/emails/alert-error-group.tsx`:
- Line 58: Change the default export of the component function Email to a named
export (use "export function Email(...)" instead of "export default function
Email(...)") and then update the corresponding import in the emails index module
to use a named import (e.g. import { Email } from '...') or rename on import if
needed; also update any usages that relied on the default import to reference
the named export.

---

Outside diff comments:
In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts`:
- Around line 64-85: getSummary() claims to return ErrorGroupSummary but omits
the required state field; update getSummary() to include a state property in its
returned object (either populated from the same data source used by call() or a
sensible default with fields like status, resolvedAt, resolvedInVersion,
resolvedBy, ignoredUntil, ignoredReason, ignoredByUserId,
ignoredUntilOccurrenceRate, ignoredUntilTotalOccurrences) so the returned object
matches the ErrorGroupSummary type, and remove any subsequent reassignment in
call() that assumes state is missing.

In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx:
- Around line 560-574: The switch in alertTypeTitle(ProjectAlertType) is missing
a handler for the new "ERROR_GROUP" enum value and will throw for that case;
update the alertTypeTitle function to add a case for "ERROR_GROUP" (e.g., case
"ERROR_GROUP": return "Error group";) so all ProjectAlertType variants are
handled and the default throw is avoided when ERROR_GROUP alerts are rendered
(used elsewhere where alerts are displayed).

---

Nitpick comments:
In
`@apps/webapp/app/routes/_app.orgs`.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx:
- Around line 630-632: The current onSubmit handler uses setTimeout(onClose,
100) which is fragile; remove the setTimeout call in the onSubmit prop of
CustomIgnoreForm and instead close the dialog when the fetcher completes by
adding a useEffect in the CustomIgnoreForm component that watches fetcher.state
and fetcher.data (e.g., useEffect(() => { if (fetcher.state === "idle" &&
fetcher.data?.ok) onClose(); }, [fetcher.state, fetcher.data, onClose]));
alternatively, if you intend optimistic UI, call onClose() immediately in
onSubmit and omit the timeout.

In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts`:
- Around line 30-46: Replace the interface declaration ErrorAlertPayload with a
type alias using the same shape (e.g., type ErrorAlertPayload = { ... }); keep
the property names and nested error object unchanged (including channelId,
projectId, classification, and the error fields like fingerprint, environmentId,
environmentName, taskIdentifier, errorType, errorMessage, sampleStackTrace,
firstSeen, lastSeen, occurrenceCount) so all references to ErrorAlertPayload and
ErrorAlertClassification continue to work; update any imports/exports if
necessary to reflect the type alias.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts`:
- Around line 16-26: Replace the two internal interfaces with type aliases to
match repo convention: convert interface AlertableError and interface
ResolvedEnvironment into type AlertableError = { ... } and type
ResolvedEnvironment = { ... }, keeping the exact same property names and types
(classification, error, environmentName) and (id, type, displayName)
respectively; update any local references if needed but do not change the shape
or visibility.

In `@internal-packages/clickhouse/src/index.ts`:
- Around line 248-263: The new errors getter (get errors) that exposes multiple
query-builder helpers (getGroups, getInstances, getHourlyOccurrences,
affectedVersionsQueryBuilder, listQueryBuilder, occurrencesListQueryBuilder,
createOccurrencesQueryBuilder, createOccurrencesByVersionQueryBuilder,
occurrenceCountSinceQueryBuilder, activeErrorsSinceQueryBuilder,
occurrenceCountsSinceQueryBuilder) needs repository-standard tracing crumbs;
wrap the errors block with // `@crumbs` comments (or a // `#region` `@crumbs` ... //
`#endregion` `@crumbs` block) immediately before and after the getter so agentcrumbs
can trace calls to these functions, ensuring the comments surround the get
errors { ... } block and reference the same function names for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e7e10a3c-d2f2-429c-85bb-c4d210930768

📥 Commits

Reviewing files that changed from the base of the PR and between cd9b5ea and e609e8b.

📒 Files selected for processing (21)
  • apps/webapp/app/components/logs/LogsVersionFilter.tsx
  • apps/webapp/app/components/primitives/charts/ChartRoot.tsx
  • apps/webapp/app/components/runs/v3/RunFilters.tsx
  • apps/webapp/app/models/projectAlert.server.ts
  • apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts
  • apps/webapp/app/presenters/v3/ErrorsListPresenter.server.ts
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.alerts/route.tsx
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors.$fingerprint/route.tsx
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.errors._index/route.tsx
  • apps/webapp/app/v3/alertsWorker.server.ts
  • apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts
  • apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts
  • apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts
  • apps/webapp/app/v3/services/errorGroupActions.server.ts
  • internal-packages/clickhouse/src/errors.ts
  • internal-packages/clickhouse/src/index.ts
  • internal-packages/database/prisma/migrations/20260306102053_error_group_state/migration.sql
  • internal-packages/database/prisma/migrations/20260308181657_add_error_alert_config_to_project_alert_channel/migration.sql
  • internal-packages/database/prisma/schema.prisma
  • internal-packages/emails/emails/alert-error-group.tsx
  • internal-packages/emails/src/index.tsx
🚧 Files skipped from review as they are similar to previous changes (4)
  • apps/webapp/app/components/logs/LogsVersionFilter.tsx
  • apps/webapp/app/components/runs/v3/RunFilters.tsx
  • apps/webapp/app/components/primitives/charts/ChartRoot.tsx
  • internal-packages/clickhouse/src/errors.ts

Comment on lines +130 to 145
const [summary, affectedVersions, runList, stateRow] = await Promise.all([
this.getSummary(organizationId, projectId, environmentId, fingerprint),
this.getAffectedVersions(organizationId, projectId, environmentId, fingerprint),
this.getRunList(organizationId, environmentId, {
userId,
projectId,
fingerprint,
versions,
pageSize: runsPageSize,
from: time.from.getTime(),
to: time.to.getTime(),
cursor,
direction,
}),
this.getState(environmentId, fingerprint),
]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

State lookup needs the task identifier.

ErrorGroupState is keyed by environment + taskIdentifier + fingerprint elsewhere in this PR, but this query only filters by environment and fingerprint. If the same fingerprint exists under multiple tasks, the page can show the wrong resolved/ignored state; fetching summary and stateRow in the same Promise.all also prevents you from disambiguating it.

Also applies to: 318-340

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/presenters/v3/ErrorGroupPresenter.server.ts` around lines 130
- 145, getState is being called without the task identifier so it can return the
wrong ErrorGroupState when the same fingerprint exists under multiple tasks;
stop fetching summary and state in parallel and instead first await
this.getSummary(...) to read the taskIdentifier, then call
this.getState(environmentId, taskIdentifier, fingerprint) (or otherwise pass the
task identifier into getState) when you run the remaining Promise.all for
getAffectedVersions, getRunList and getState so state lookup is disambiguated by
taskIdentifier.

Comment on lines +104 to +112
async #scheduleErrorAlertEvaluation(projectId: string): Promise<void> {
await alertsWorker.enqueue({
id: `evaluateErrorAlerts:${projectId}`,
job: "v3.evaluateErrorAlerts",
payload: {
projectId,
scheduledAt: Date.now(),
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't reuse the periodic job id for the immediate kick.

This helper uses the same evaluateErrorAlerts:${projectId} id that the evaluator's delayed self-chain uses. If a future run is already queued, this enqueue can become a no-op, so newly enabled or edited ERROR_GROUP channels may not be evaluated until the old availableAt.

Suggested fix
   async `#scheduleErrorAlertEvaluation`(projectId: string): Promise<void> {
     await alertsWorker.enqueue({
-      id: `evaluateErrorAlerts:${projectId}`,
+      id: `evaluateErrorAlerts:manual:${projectId}:${Date.now()}`,
       job: "v3.evaluateErrorAlerts",
       payload: {
         projectId,
         scheduledAt: Date.now(),
       },
     });
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async #scheduleErrorAlertEvaluation(projectId: string): Promise<void> {
await alertsWorker.enqueue({
id: `evaluateErrorAlerts:${projectId}`,
job: "v3.evaluateErrorAlerts",
payload: {
projectId,
scheduledAt: Date.now(),
},
});
async `#scheduleErrorAlertEvaluation`(projectId: string): Promise<void> {
await alertsWorker.enqueue({
id: `evaluateErrorAlerts:manual:${projectId}:${Date.now()}`,
job: "v3.evaluateErrorAlerts",
payload: {
projectId,
scheduledAt: Date.now(),
},
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/createAlertChannel.server.ts` around lines
104 - 112, The helper `#scheduleErrorAlertEvaluation` currently enqueues a job
using the same id pattern "evaluateErrorAlerts:${projectId}" that's used by the
evaluator's delayed self-chain, causing deduplication/no-op if a future run is
already queued; change the id for this immediate kick to be unique (for example
append a timestamp or a "now" suffix like
"evaluateErrorAlerts:${projectId}:now:${Date.now()}") so alertsWorker.enqueue
always creates a distinct job when triggered immediately after channel
creation/edit; keep the job name "v3.evaluateErrorAlerts" and payload identical,
only alter the id generation logic in `#scheduleErrorAlertEvaluation`.

Comment on lines +239 to +243
channel: {
type: ProjectAlertChannelType;
properties: unknown;
project: { id: string; externalRef: string; slug: string; name: string; organizationId: string; organization: { slug: string; title: string } };
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redact webhook URLs from logs and thrown errors.

Webhook URLs commonly embed credentials in the path or query string. Logging webhookProperties.data.url and including it in the thrown error will leak those secrets into worker logs and retry metadata.

Suggested fix
   async `#sendWebhook`(
     channel: {
+      id: string;
       type: ProjectAlertChannelType;
       properties: unknown;
       project: { id: string; externalRef: string; slug: string; name: string; organizationId: string; organization: { slug: string; title: string } };
     },
@@
     if (!response.ok) {
       logger.info("[DeliverErrorGroupAlert] Failed to send webhook", {
         status: response.status,
         statusText: response.statusText,
-        url: webhookProperties.data.url,
+        channelId: channel.id,
       });
-      throw new Error(`Failed to send error group alert webhook to ${webhookProperties.data.url}`);
+      throw new Error("Failed to send error group alert webhook");
     }

Also applies to: 310-316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts` around
lines 239 - 243, The code logs and throws the raw webhook URL
(webhookProperties.data.url) from the channel object (type
ProjectAlertChannelType), which can leak embedded credentials; update
deliverErrorGroupAlert.server.ts to sanitize/redact webhookProperties.data.url
before any logging or including it in thrown errors (e.g., replace
host/path/query credentials with "[REDACTED]" or strip auth/query params) and
ensure any thrown Error messages or processLogger entries use the redacted
value; apply the same change for the other occurrences around the webhook
handling block (also referenced at the second occurrence near lines 310-316) so
no raw webhook URL is emitted.

Comment on lines +30 to +35
export class ErrorAlertEvaluator {
constructor(
protected readonly _prisma: PrismaClientOrTransaction = $replica,
protected readonly _replica: PrismaClientOrTransaction = prisma,
protected readonly _clickhouse: ClickHouse = clickhouseClient
) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Swap the primary and replica defaults here.

_prisma currently defaults to $replica and _replica defaults to prisma, so updateErrorGroupStates() writes through the read client while the reads hit the primary. On replica-backed deployments that will either fail outright or leave alert state changes unpersisted.

Suggested fix
   constructor(
-    protected readonly _prisma: PrismaClientOrTransaction = $replica,
-    protected readonly _replica: PrismaClientOrTransaction = prisma,
+    protected readonly _prisma: PrismaClientOrTransaction = prisma,
+    protected readonly _replica: PrismaClientOrTransaction = $replica,
     protected readonly _clickhouse: ClickHouse = clickhouseClient
   ) {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 30 - 35, The constructor for ErrorAlertEvaluator has its _prisma and
_replica defaults reversed causing writes (e.g., updateErrorGroupStates) to go
to the read-only client; swap the defaults so _prisma defaults to prisma and
_replica defaults to $replica. Update the constructor parameter
order/assignments for ErrorAlertEvaluator accordingly so any write operations
use the primary Prisma client (_prisma = prisma) and reads use the replica
(_replica = $replica).

Comment on lines +102 to +125
for (const alertable of alertableErrors) {
const envChannels = channelsByEnvId.get(alertable.error.environment_id) ?? [];
for (const channel of envChannels) {
await alertsWorker.enqueue({
job: "v3.deliverErrorGroupAlert",
payload: {
channelId: channel.id,
projectId,
classification: alertable.classification,
error: {
fingerprint: alertable.error.error_fingerprint,
environmentId: alertable.error.environment_id,
environmentName: alertable.environmentName,
taskIdentifier: alertable.error.task_identifier,
errorType: alertable.error.error_type,
errorMessage: alertable.error.error_message,
sampleStackTrace: alertable.error.sample_stack_trace,
firstSeen: alertable.error.first_seen,
lastSeen: alertable.error.last_seen,
occurrenceCount: alertable.error.occurrence_count,
},
},
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Longer channel intervals are effectively ignored.

The evaluator schedules the whole project at the shortest configured interval, but the delivery loop still enqueues alerts for every matching channel on every run. That means a channel configured for 60 minutes will still receive alerts every 5 minutes if any sibling channel uses a 5-minute interval.

Also applies to: 212-221, 391-404

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 102 - 125, The loop enqueues alerts to every envChannels entry on every
evaluator run, ignoring per-channel intervals; inside the for (const channel of
envChannels) loop (before calling alertsWorker.enqueue) add a guard that
respects the channel's configured interval by comparing it to the evaluator's
current run interval (e.g., currentRunIntervalMinutes or
evaluationIntervalMinutes): only enqueue when the current run aligns with the
channel interval (for example channel.intervalMinutes ===
currentRunIntervalMinutes or channel.intervalMinutes % currentRunIntervalMinutes
=== 0), otherwise continue; update any related loops mentioned (the other
occurrences around the same pattern at the referenced ranges) to use the same
interval-check guard.

Comment on lines +178 to +197
if (
state.ignoredUntilOccurrenceRate !== null &&
state.ignoredUntilOccurrenceRate !== undefined
) {
const windowMinutes = Math.max(context.windowMs / 60_000, 1);
const rate = context.occurrencesSince / windowMinutes;
if (rate > state.ignoredUntilOccurrenceRate) {
return true;
}
}

if (
state.ignoredUntilTotalOccurrences !== null &&
state.ignoredUntilTotalOccurrences !== undefined &&
state.ignoredAt
) {
if (context.occurrencesSince >= state.ignoredUntilTotalOccurrences) {
return true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ignoredUntilTotalOccurrences is only checking the latest window.

getOccurrenceCountsSince() always counts from the current scheduledAt, so a total-occurrence threshold only trips if the entire threshold is exceeded in one evaluation cycle. A rule like "unignore after 100 occurrences" will never fire if the error accrues 20 occurrences across five runs.

Also applies to: 317-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 178 - 197, The total-occurrence unignore check uses
context.occurrencesSince (window-only) so a threshold like
ignoredUntilTotalOccurrences won’t fire across multiple evaluation windows;
change the logic to compute the cumulative occurrences since the alert was
ignored and compare that to state.ignoredUntilTotalOccurrences. Specifically,
replace the context.occurrencesSince check with a call that gets counts since
state.ignoredAt (or otherwise compute occurrences since state.ignoredAt using
getOccurrenceCountsSince/your occurrence-count helper) and use that cumulative
count in the if that references state.ignoredUntilTotalOccurrences and
state.ignoredAt.

Comment on lines +372 to +387
await this,
this._prisma.errorGroupState.update({
where: { id: state.id },
data: {
status: "UNRESOLVED",
ignoredUntil: null,
ignoredUntilOccurrenceRate: null,
ignoredUntilTotalOccurrences: null,
ignoredAt: null,
ignoredReason: null,
ignoredByUserId: null,
resolvedAt: null,
resolvedInVersion: null,
resolvedBy: null,
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts | sed -n '365,395p'

Repository: triggerdotdev/trigger.dev

Length of output: 1221


🏁 Script executed:

cat -n apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts | sed -n '360,400p'

Repository: triggerdotdev/trigger.dev

Length of output: 1543


Remove the errant await this, comma expression that prevents the state update from being awaited.

Line 372 has await this, which is a comma operator that discards the awaited this reference and returns the unawaited Prisma promise. The update will execute but won't be awaited, allowing the method to return and subsequent operations (like selfChain()) to run before the database write commits. This creates a race condition where state changes are invisible to the job's continuation.

Fix
-      await this,
-        this._prisma.errorGroupState.update({
+      await this._prisma.errorGroupState.update({
           where: { id: state.id },
           data: {
             status: "UNRESOLVED",
             ignoredUntil: null,
             ignoredUntilOccurrenceRate: null,
             ignoredUntilTotalOccurrences: null,
             ignoredAt: null,
             ignoredReason: null,
             ignoredByUserId: null,
             resolvedAt: null,
             resolvedInVersion: null,
             resolvedBy: null,
           },
         });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await this,
this._prisma.errorGroupState.update({
where: { id: state.id },
data: {
status: "UNRESOLVED",
ignoredUntil: null,
ignoredUntilOccurrenceRate: null,
ignoredUntilTotalOccurrences: null,
ignoredAt: null,
ignoredReason: null,
ignoredByUserId: null,
resolvedAt: null,
resolvedInVersion: null,
resolvedBy: null,
},
});
await this._prisma.errorGroupState.update({
where: { id: state.id },
data: {
status: "UNRESOLVED",
ignoredUntil: null,
ignoredUntilOccurrenceRate: null,
ignoredUntilTotalOccurrences: null,
ignoredAt: null,
ignoredReason: null,
ignoredByUserId: null,
resolvedAt: null,
resolvedInVersion: null,
resolvedBy: null,
},
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/alerts/errorAlertEvaluator.server.ts` around
lines 372 - 387, Remove the stray comma-expression "await this, ..." so the
Prisma update is actually awaited: locate the call to
this._prisma.errorGroupState.update (the block that sets status: "UNRESOLVED"
and clears ignored/resolved fields) and change the statement to await the update
promise directly (i.e., await this._prisma.errorGroupState.update(...)) so the
database write completes before continuing (e.g., before calling selfChain()).

project: "my-project",
};

export default function Email(props: AlertErrorGroupEmailProps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify default export removal in the new email file
rg -nP --type=tsx '^\s*export\s+default\b' internal-packages/emails/emails/alert-error-group.tsx

# Verify named import usage in EmailClient index
rg -nP --type=tsx 'alert-error-group' internal-packages/emails/src/index.tsx -A3 -B3

Repository: triggerdotdev/trigger.dev

Length of output: 96


🏁 Script executed:

# Search for default export in alert-error-group file
rg -n 'export\s+default' internal-packages/emails/emails/alert-error-group.tsx

# Check the import statement in index.tsx
rg -n 'alert-error-group' internal-packages/emails/src/index.tsx -A5 -B2

# Get file line counts to know safe read ranges
wc -l internal-packages/emails/emails/alert-error-group.tsx internal-packages/emails/src/index.tsx

Repository: triggerdotdev/trigger.dev

Length of output: 906


🏁 Script executed:

# Read the alert-error-group file around the export line
sed -n '55,65p' internal-packages/emails/emails/alert-error-group.tsx

# Check if this export is imported/used anywhere else
rg -n 'AlertErrorGroupEmail' internal-packages/emails/ --type=ts --type=tsx

# Check the export in index.tsx around the import
sed -n '1,20p' internal-packages/emails/src/index.tsx

Repository: triggerdotdev/trigger.dev

Length of output: 1261


🏁 Script executed:

# Check all usages of AlertErrorGroupEmail across the codebase
rg -n 'AlertErrorGroupEmail' --max-count=20

# Also check if there are any other references to alert-error-group file
rg -n 'alert-error-group' internal-packages/ --max-count=20

Repository: triggerdotdev/trigger.dev

Length of output: 1203


Replace the default export with a named function export.

Line 58 introduces a default export, which violates the repository export rule. Switch to a named export and update the import in internal-packages/emails/src/index.tsx accordingly.

Suggested patch
- export default function Email(props: AlertErrorGroupEmailProps) {
+ export function AlertErrorGroupEmail(props: AlertErrorGroupEmailProps) {
   const {
     classification,
     taskIdentifier,
     environment,
     error,
     occurrenceCount,
     errorLink,
-import AlertErrorGroupEmail, {
-  AlertErrorGroupEmailSchema,
-} from "../emails/alert-error-group";
+import {
+  AlertErrorGroupEmail,
+  AlertErrorGroupEmailSchema,
+} from "../emails/alert-error-group";

Per coding guidelines: **/*.{ts,tsx,js,jsx}: Use function declarations instead of default exports

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export default function Email(props: AlertErrorGroupEmailProps) {
export function AlertErrorGroupEmail(props: AlertErrorGroupEmailProps) {
const {
classification,
taskIdentifier,
environment,
error,
occurrenceCount,
errorLink,
Suggested change
export default function Email(props: AlertErrorGroupEmailProps) {
import {
AlertErrorGroupEmail,
AlertErrorGroupEmailSchema,
} from "../emails/alert-error-group";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal-packages/emails/emails/alert-error-group.tsx` at line 58, Change the
default export of the component function Email to a named export (use "export
function Email(...)" instead of "export default function Email(...)") and then
update the corresponding import in the emails index module to use a named import
(e.g. import { Email } from '...') or rename on import if needed; also update
any usages that relied on the default import to reference the named export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants